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Abstract 

This  paper  introduces  a  class  of  robust  lag-k  smoothers  based  on  simple  low 
order  Markov  models  for  the  Gaussian  trend-like  component  of  signal  plus  non- 
Gaussian  noise  models.  The  k^  order  Markov  models  are  of  the  difference 
form  A *xt  =  e<  where  Qx)  =  x&-  xt-t  and  Cj  is  a  zero-mean  white  Gaussian 
noise  process  with  variance  a The  nominal  additive  noise  is  a  zero-mean  white 
Gaussian  noise  sequence  with  variance  cr^ ,  while  the  actual  additive  noise  is  non- 
Gaussian  with  an  outliers  generating  distribution,  e.g.,  (1  —  -y)iV(0, cr^)  +  -yff. 
This  setup  is  particularly  appropriate  for  radar  glint  noise.  Implementation  of 
the  smoothers  requires  estimation  of  the  two  parameters  a l  and  a *  This  is 
accomplished  using  a  robustified  maximum  likelihood  approach.  Application 
to  both  artificial  data  sets  and  to  glint  noise  data  reveals  that  the  approach  is 
quite  effective.  We  briefly  discuss  the  choices  of  lag  k  for  the  smoothers  and 
also  briefly  study  the  sensitivity  of  our  approach  to  model  mismatch. 
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1  Introduction 


The  motivation  for  this  paper  is  the  desire  to  construct  robust  fixed  lag  smoothers 
which  work  well  on  radar  glint  noise,  and  which  have  a  simple  structure  with  at 
most  a  few  parameters  to  estimate.  We  achieve  this  end  by  both  generalizations  and 
specialization  of  known  methods  (Martin,  1979;  Martin  and  Thompson,  1982)  for 
constructing  robust  filters  which  are  insensitive  to  outliers. 

One  generalization  consists  of  building  on  the  known  robust  filter  algorithms  to 
obtain  robust  fixed-lag  smoothers.  The  emphasis  here  is  on  fixed-lag  smoothing,  as 
opposed  to  the  use  of  existing  robust  fixed  interval  smoothers  (e.g.,  as  in  Martin, 
1979;  Martin  and  Yohai,  1985),  because  we  ultimately  wish  to  use  such  smoothers  in 
real  time  applications. 

A  second  generalization  is  that  of  estimating  the  model  parameters,  including  the 
standard  deviation  of  a  nominally  Gaussian  additive  noise  model,  using  a  robustified 
maximum  likelihood  method. 

The  specialization  is  guided  by  our  prior  knowledge  about  radar  glint  noise.  Such 
noise  contains  both  a  low  frequency  approximately  Gaussian  trend-like  core  compo¬ 
nent,  and  a  broad-band  highly  non-Gaussian  component  associated  with  glint  noise 
“spikes”  or  outliers.  The  goal  of  the  robust  smoothing  is  to  produce  a  smooth  ver¬ 
sion  of  the  trend-like  core  component.  Prior  experiments  (Martin  and  Thomson, 
1982)  with  robust  autoregressive  model  fitting  and  robust  smoothing  indicate  that 
the  trend-like  component  is  well  modeled  by  l'\  2nd  or  at  most  3rd  order  autore¬ 
gressive  models  with  parameters  on  the  boundary  of  nonstationarity.  More  precisely, 
state  variable  models  wherein  1st,  2nd  or  3rd  differences  of  the  state  variable  are 
equal  to  a  Gaussian  white  noise  innovations  process  have  appeared  to  work  quite  well 
(correspondingly  the  state  variable  is  once,  twice  or  thrice  “intergrated”  white  noise). 
Thus  our  specialization  is  to  use  such  simple  state  variable  models  to  model  the  trend¬ 
like  component.  Correspondingly,  it  is  not  necessary  to  estimate  the  autoregressive 
parameters  in  the  state  variable  model,  only  an  estimate  of  the  innovations  variance 
er2  for  the  state  process  is  required. 
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It  is  assumed  that  there  is  additive  white  noise  vt  which  is  well  modeled  by  the 
mixture  model 

Fv  =  (i-7)m*02)  +  7  h  (i) 

with  Gaussian  central  component  N(0,cr2),  and  with  contamination  distribution  If. 
The  latter  is  a  heavy-tailed  outliers  generating  distribution  which  accounts  for  the 
glint  noise  spikes. 

Past  experience  shows  that  it  is  not  terribly  important  to  estimate  the  fraction 
of  contamination  7  -  presumption  of  any  value  in  the  range  .1  to  .25  will  usually  do 
quite  well.  However,  estimates  of  <r2  is  quite  important,  since  a\  and  a\  determine 
the  degree  of  smoothing  of  our  smoothers,  just  as  in  the  case  of  linear  smoothers. 
Thus  there  are  just  two  unknown  parameters,  and  0 2,  to  be  estimated. 

There  is  also  a  “tuning  constant”  c  which  controls  the  tradeoff  between  robustness 
and  efficiency  for  the  robust  smoother.  This  constant  can  be  set  in  advance  using  the 
study  of  Martin  and  Su  (1985)  for  guidance. 

The  paper  is  organized  as  follows.  Section  2  reviews  robust  filters  for  the  state- 
variable  setup  of  interest.  Section  3  presents  robust  fixed-lag  smoothers.  Section  4 
discusses  the  robust  estimation  of  the  unknown  model  parameters  a2  and  cr2.  Section 
5  presents  the  application  of  the  method  to  the  radar  glint  noise  data.  Section  6 
discusses  the  choice  of  lag.  Section  7  considers  the  issue  of  sensitivity  of  the  method 
to  model  mismatch.  Section  8  contains  some  concluding  remarks. 

2  The  State  Variable  Setup  and  Robust  Filters 

2.1  The  State  Variable  Setup 

Suppose  that  the  scalar  observations  ij\.....yn  are  generated  by  the  following  state- 
variable  system: 


xt  =  +  et  (2) 

ijt  =  Hxt  +  vt. 
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where  xt  and  €t  have  dimension  p,  $  is  a  p  x  p  matrix  and  H  is  a  1  x  p  matrix. 
It  is  also  assumed  that  xt  is  independent  of  future  et1  and  that  et,  vt  are  mutually 
independent  sequences  which  are  individually  independent  and  identically  distributed 
(i.i.d).  The  innovation  process  et  for  the  state  equation  is  assumed  to  be  a  white  noise 
zero  mean  Gaussian  process  with  covariance  matrix  Q.  The  additive  noise  process  vt 
is  assumed  to  be  zero  mean  white  noise  non-Gaussian  mixture  distribution  given  by 
(!)• 

An  example  of  this  system  is  the  pth  order  autoregression 

p 

xt  =  £  ^tXt-x  +  (3) 

i=i 

This  model  has  the  above  state- variable  form  with 

^  <pi  fa  ...  4>p-\  4>p 

1  0  ...  0  0 

$=  0  I  ...  0  0 

^  0  o  ...  1  0 

Xt  =  (xt,  X<_j,  .  .  .  , 

H  =  (1,0,... ,0) 
ef  =  (et,0,...,0). 

The  notation  to  be  used  throughout  is  as  follows.  Denote  the  first  t  observations 
by  Y*  =  (yi,...,yt).  The  state  prediction  density  is  /(a;t+i|yf)  is  the  conditional 
density  of  x<+1  given  Y f.  This  density  is  assumed  to  exist  for  t  >  1.  The  observation 
prediction  density  is  /(yt+i|V'*).  The  conditional-mean  estimate  of  xt  given  Y*  is 
written  £t  =  Eix^Y*)  and  the  conditional-mean  predictor  of  xt+i  given  Y*  is  written 
X(+i  =  E(X(+i|K‘).  The  conditional- mean  estimate  of  xt  given  Yt+t  for  a  fixed 
constant  l  >  0,  is  written  x\+l  =  E{xt\Yt+l).  This  estimate  is  called  a  fixed-lag 
(lag  l)  smoother.  Assuming  they  exist,  these  conditional-means  are  the  minimum 
mean-squared-error  linear  estimates  given  Y*  or  yt+i  (Meditch,  1969). 
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2.2  The  pth  Order  Difference  Models 

In  this  subsection,  we  describe  the  pth  order  difference  model  and  its  state  variable 
setup.  The  pth  order  difference  model  is  defined  as 

Apxt  =  et 


where 


A1!*  —  Xt  —  Xt-1 

Apxt  =  Ap~lxt  —  Ap-1it_i, 


and  et  has  a  white  noise  Gaussian  distribution  with  mean  0  and  variance  a These 
models  are  the  nonstationary  limiting  case  of  stationary  autoregressive  models  as  the 
coefficients  tend  to  particular  points  on  the  boundary  of  stationarity.  For  example, 
the  2nd  order  (p  =  2)  differences  model  can  be  presented  as 


Xt  =  2x,_i  -  Xf—2  + 


which  is  the  same  as  in  (3)  with  <pi  =  2  and  <$>%  =  —  1.  So  the  state  variable  setup  for 
the  pth  order  difference  model  is  similar  to  (4)  with  4>p  appropriately  specified. 

As  we  mention  in  the  introduction,  the  difference  models  with  order  p  <  3  are 
used  for  our  fixed-lag  smoothers.  We  now  present  the  state  variable  setup  for  the 
1**,  2nd  and  3rd  order  difference  models.  The  $,  H,  e  and  x  for  each  order  are  as 
follows. 

Case  p  —  1: 


$  =  1 
Xt  =  X, 

H  =  1 
et  =  tf 

Case  p  =  2: 
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Case  p  =  3: 


xt 

= 

(if,  Xf_i ) 

H 

= 

(TO) 

‘I 

= 

(«*,0). 

1  3 

-3  1  N 

$  = 

1 

0  0 

u 

i  oj 

xj  =  (xt,xt-i,xt-2) 
H  =  (1,0,0) 
ej  -  (et,0,0). 


The  choice  of  such  state  variable  models  for  smoothing  problems  is  motivated  by 
the  connection  between  continuous  time  versions  of  such  models  and  spline  smoothing 
(see  for  example,  Wecker  and  Ansley,  1983). 


2.3  Masreliez’s  Filter 

We  now  discuss  the  methods  for  estimating  the  conditional  means  of  xt  given  Yt  in 
the  system  (2).  When  et  and  vt  in  system  (2)  are  Gaussian,  the  straight  forward 
computation  of  =  E{xt\Yt)  by  any  one  of  a  variety  of  techniques  (Jazwinski, 
1970)  yields  the  Kalman  filter  recursion  equations.  Unfortunately,  the  explicit  and 
exact  calculation  of  xt  in  closed  form  is  virtually  intractable  when  vt  is  non-Gaussian, 
except  in  a  few  special  cases,  e.g.,  as  in  the  case  of  stable  random  variable  (see  Stuck, 
1976).  However,  there  is  a  simplifying  assumption,  discovered  by  Masreliez  (1975), 
which  allows  one  to  make  an  exact  calculation  of  xt.  Masreliez’s  assumption  is  that 
the  state  predictor  density  is  Gaussian 

f(xt\Yt-l)  =  N(xt-xtt-\Mt)  (5) 

where  N(-;p,  E)  denotes  the  multivariate  normal  density  with  mean  p  and  covariance 
matrix  E,  and 

A/e  =  E{(xt  -  x;-l)(*e  -  x;-1)T|Kf-1}-  (6) 
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is  the  conditional  covariance  matrix  for  the  state  prediction  error.  Under  this  as¬ 
sumption,  Masreliez  (1975)  showed  that  xt  =  k  >  1  can  be  obtained  by 

the  recursions 


it  =  x\~l +MtHT\t[yt)  (7) 

A/t+i  =  <f>Pt$T  +  Q  (8) 

Pt  =  Mt  -  MtHT%(yt)HMt  (9) 

where 

*«(».)  =  (10) 

oyt 

is  the  “score”  function  for  the  observation  prediction  density  /K(y«|^t_1))  Q  is  the 
covariance  matrix  of  et,  xj-1  =  $xt_j  and 

ny.)  =  (11) 

Comments  and  discussions  on  the  appropriateness  of  Masreliez’s  assumption  and 
the  validity  of  the  representation  (7)-(  10)  can  be  found  in  Martin  (1979),  and  Fraiman 
and  Martin  (1988). 


2.4  ACIM-Type  Robust  Filter  for  Autoregressions 


Utilizing  the  Masreliez  result,  Martin  (1979)  suggests  an  approximate  conditional 
mean  (ACM)  type  robust  filter  described  as  follows.  Let  cr\  be  the  variance  of  the 
Gaussian  component  of  the  distribution  (1)  for  vt,  and  let  $  be  a  robustifying  psi 
function  as  in  the  robustness  literature  (see  for  example,  Huber  1981;  Hampel  et  al 
1986).  Set 

s2t  =  mik  +  <7q 


where  is  the  1-1  element  of  Aft  the  state  prediction  error  covariance  matrix.  Then 
the  recursions  (7)-(9)  are  replaced  by 


it 

Rf+1 

Pt 


+  ^t-s(^c(  —  ) 

•*t  st 

<PPt<PT  +  Q 
Mt  -  «•(-) — ■ 


(12) 

(13) 

(14) 
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where  mt  is  the  first  column  of  Mt  and  rt  is  the  observation  prediction  residual 


r  i  =  y  e  i'r’ 


T wo  possible  choices  of  w  are 


w(rt)  =  $'c(rt) 


by  analogy  with  (11),  or 


w(rt) 


Vein) 

n 


(15) 

(16) 


There  are  many  possible  choices  for  the  psi-function  in  the  robust  filtering  context. 
These  choices  are  described  in  Martin  (1979,  1981),  Martin  and  Su  (1985).  The  main 
qualitative  characteristics  of  a  good  robustifying  rp  is  that  0  should  be  bounded  and 
“redescending”.  For  example,  Hampel’s  two-part  redescending  function  defined  as 


^ HA.c(V ) 


y 

a(c-y)/{a  -  1) 
-a(c  +  y)/(l  -  a) 


fyf  <  qc 
ac  <  y  <  c 
-c  <  y  <  —ac 


l  o 


|y|  >  c 


(17) 


has  these  features.  Throughout  this  paper,  we  use  ^ha,c  with  ac  =  1.6  and  c  =  4.0. 
These  are  values  suggested  by  the  study  of  Martin  and  Su  (1985). 

We  make  the  choice  ( 16)  in  this  paper  since  it  has  a  continuity/ resistance  rationale 
(see  Martin  and  Su,  1985):  the  w  in  (16)  is  continuous  for  piecewise  linear  'Pc,  whereas 
the  w  in  (15)  is  discontinuous. 


3  Robust  Fixed-Lag  Smoothers 

3.1  Robust  Fixed  Lag  Smoothers 

The  above  robust  filter  can  be  used  to  obtain  a  robust  fixed  lag  (lag  /,  for  some  /  <  n) 
smoother  for  quite  general  state  equations  by  augmenting  the  state  equation  (see  for 


example,  Anderson  and  Moore,  1979).  In  this  approach,  the  state- variable  system  (2) 
is  augmented  as  follows: 


zt  =  $>!*«_!  + 

Vt  =  HxZt  +  vt,  (18) 


where 


'  $>  0  ...  0  0  ^ 

1  0  ...  0  0 

01...  00 


\  0  0  ...  /  0  / 


T 

zt  =  (Xt,  .  .  .  ,  *t_/) 

Hi  =  (H%  0,...,0) 

(ej)r  =  (ef, 0, . . .  ,0). 

The  vectors  z4,  e{  have  dimension  p(l  +1).  is  a  p(l  +  1)  x  p(l  +  1)  matrix.  Hi  is 
a  1  x  p(l  +  1)  matrix.  So  for  0  <  i  <  /,  the  robust  fixed  lag  (i)  smoother  is  just  the 
(i  4-  1)‘*  element  of  zt  =  E(zt\Yt).  The  conditional-mean  vector  zt  is  obtained  by 
the  robust  filter. 


The  state-augmentation  approach  to  fixed  lag  smoother  construction  results  in 
state  vectors  and  matrices  with  dimension  proportional  to  the  desired  lag.  In  par¬ 
ticular,  the  dimensions  increase  by  p  if  the  lag  is  increased  by  1.  In  general,  this 
may  become  computationally  burdensome.  However,  this  problem  can  be  avoided  for 
special  case  of  autoregression  type  state  equations  since  there  is  a  lower  dimensional 
augmentation  in  such  cases.  For  the  pth  otder  autoregression  (3),  including  the  pth 
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order  difference  model,  with  lag  l  >  p,  we  can  use  (17)  with 


( fa 

fa 

...  fa 

o  ... 

0 

0  \ 

1 

0 

...  0 

o  ... 

0 

0 

0 

i 

...  0 

o  ... 

0 

0 

0 

0 

...  1 

0  ... 

0 

0 

0 

0 

...  0 

1  ... 

0 

0 

V  0 

0 

...  0 

0  ... 

1 

0 , 

zT 

zt 

= 

( >  £t— 

, . .  .  ,  £t_ 

l) 

Hx 

= 

(1,0,.. 

,0) 

(«!)T 

= 

(eT»o,  • 

.,0). 

(19) 


The  vectors  zu  Hx,  ej  have  dimensions  (/  +  1),  and  <J>i  is  a  (/  +  1)  x  (/  +  1)  matrix. 
With  this  augmentation  method,  the  dimension  increases  by  units  of  order  1  if  the 
lag  is  increased  by  1.  More  detail  on  reducing  dimensionality  can  be  found  in  Moore 
(1973),  Anderson  and  Moore  (1979). 


3.2  The  pth  Order  Difference  Model  Case 

For  the  pth  order  difference  model,  we  can  obtain  the  fixed-lag  smoothers  (lag  1  to 
lag  l)  by  using  the  setup  in  (18).  Recall  that  the  pth  order  difference  model  is  just  the 
pth  order  autoregressive  model  with  fa  ,...,<pp  appropriately  chosen.  To  illustrate  the 
approach  more  specifically,  we  present  the  detail  of  how  to  obtain  the  lag-2  and  lag-5 
smoothers  with  the  2nd  order  difference  model.  This  would  hopefully  help  to  explain 
the  method  more  clearly.  Using  the  setup  (18),  the  2nd  order  difference  model  for  lag 


5  smoother  can  be  presented  by 


f  2  -1  0  0  0  0  N 

1  0  0  0  0  0 

0  1  0  0  0  0 

0  0  1  0  0  0 

0  0  0  1  0  0 

^  0  0  0  0  1  0  t 


zj  —  (Xj,  X(_x,  X<_2,  X<_3,  Xt-A,  Xt-s) 

Hy  =  (1,0, 0,0, 0,0) 

(el)T  =  (e„  0,0, 0,0,0). 

Suppose  that  the  parameters  cr20,  a\  and  the  tuning  constant  c  are  known,  we  can 
obtain  the  conditional  mean  of  the  vector  random  variable  zt  conditioned  on  the 
observations  y2,...,yt  by  using  the  ACM-type  filter  algorithm  described  above.  The 
lag-2  smoother,  x(_2  =  £(xt_2|T‘),  is  then  the  3rd  component  of  zt.  The  lag-5 
smoother  is  then  the  6th  (last)  component  of  zt. 

However,  the  parameters  a\  and  a2  are  unknown  and  need  to  be  estimated.  In 
the  next  section,  a  method  for  estimating  these  parameters  with  pth  order  difference 
model  is  discussed. 


4  Parameter  Estimation 


4.1  Robust  Maximum  Likelihood  Type  Estimation 


Suppose  that  the  observations  yx , ...,  yn  are  generated  by  the  system  (2)  with  Gaussian 
additive  noise,  i.e.,  noise  distributed  as  (1)  with  7  =  0.  The  Gaussian  maximum 
likelihood  estimates  of  the  parameters  er*  and  <x2  are  obtained  by  minimizing  the  log 
likelihood  function 


=  nlog(2jr)  +  £  log  s2(a20,  cr2)  + 


1=1 


n 

V' 

1=1 


(yj  - 


(20) 
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with  respect  to  <r\  and  of,  where  y'~x  and  s*  are  obtained  by  the  usual  Kalman  filter 
recursions  (obtained  from  the  robust  filter  with  c  =  oc). 

However,  the  initial  value  Pq  is  undefined  for  the  case  of  pth  order  difference  model 
because  of  its  nonstationarity.  To  avoid  this  difficulty,  Harvey  (1981)  suggests  that 
one  can  choose 

P0  =  A  / 


where  A  is  large  scalar  and  I  is  the  identity  matrix.  Using  this  initial  value,  one  can 
ignore  the  first  k  observations  in  calculating  the  log  likelihood  function  for  some  small 
integer  k.  Throughout  the  paper,  A  is  chosen  to  be  the  square  of  the  range  of  the 
data  and  xo  =  0.  The  log  likelihood  function  (19)  is  then  defined  as 


'(<*«?>-»  1o*(2t)+  E  1o«*?KV?)+  E 

»=fc+l  i=Jfc+ 1  Si  \Coiai) 


To  robustify  the  ML  approach  in  the  case  of  additive  outliers  which  occur  when 
7  >  0  in  (1),  we  propose  that  the  parameters  and  of  be  estimated  by  minimizing 
the  ‘robustified’  log  likelihood  type  loss  function 


\  =  YL  +  Yj  PciO 

»=fc+ 1  i—k+\ 


is  the  Huber  (1964)  loss  function  with  “tuning”  constant  cx. 

In  our  examples  and  applications  to  radar  glint  noise,  we  have  chosen  the  integer 
k  to  be  about  5  percent  of  the  total  number  of  observations  and  c\  is  chosen  to  be 
1.2  as  suggested  by  the  study  of  Martin  and  Su  (1985). 

The  exact  calculation  of  estimates  b\  and  cr]  which  minimizes  (22),  in  closed 
form  is  intractable,  and  one  needs  to  employ  some  numerical  methods  for  minimizing 
l.  Since  we  only  have  to  estimate  two  parameters,  a  simple  brute  force  numerical 
method  is  possible.  Evaluate  the  function  l  on  a  grid  of  values  of  of  and  of,  and 
choose  the  grid  value  that  minimizes  /.  This  is  the  method  we  use  for  our  examples 
and  applications. 
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When  the  model  for  the  data  is  correct,  the  log  likelihood  function  should  behave 
nicely,  that  is,  it  is  approximately  convex  and  locally  quadratic  near  the  minimum 
so  that  the  minimum  is  easy  to  be  detected.  Likewise,  one  expects  the  robust  loss 
function./  to  have  similar  behaviour  if  the  model  is  correct.  However,  if  the  model 
is  incorrect,  the  log  likelihood  and  robust  loss  functions  may  exhibit  poor  behaviour, 
e.g.,  there  may  not  be  a  clear  minimum  of  the  function.  This  behaviour  is  clearly 
exhibited  in  the  application  of  the  method  to  some  artificial  data  sets  and  the  radar 
glint  noise  data. 

In  the  next  subsection,  we  present  two  examples  to  illustrate  how  the  method 
works.  We  use  artificial  data  sets  in  these  examples,  one  with  the  nominal  variance 
in  (1)  <y\  —  0  and  another  with  al  >  0.  We  also  examine  how  the  robust  loss 
function  /  behaves  when  incorrect  (mismatched)  models  are  used  in  these  examples. 

4.2  Examples 

Example  1:  (<72  —  0) 

A  sample  of  200  data  points  is  generated  from  a  contaminated  autoregressive 
model  of  order  one  with  <f>  =  1  (i.e.  the  first  order  difference  model)  and  i0  =  0. 
The  model  is 


xt  =  xt_!  +  et 
yt  =  xt  +  vt 

where  et  is  white  noise  standard  Gaussian  process  and  vt  is  generated  from  (1)  with 
(?l  =  0,  7  =  .1  and  H  ~  /V(0, 25).  Thus  there  is  no  nominal  additive  Gaussian 
noise  to  contend  with,  and  /  =  /(<72)  depends  on  just  one  parameter,  the  innovation 
variance  al . 

The  data  are  plotted  in  Figure  l.a.  The  plot  shows  that  there  are  big  spikes  in 
the  data  and  the  nonstationarity  of  the  data  is  rather  obvious. 

The  loss  function  /(cr2)  was  evaluated  for  100  values  of  <72  equally  spaced  from  .1 
to  4.  The  results,  displayed  in  Figure  l.b,  show  that  the  function  is  quite  smooth  and 
essentially  convex.  The  minimum  value  of  /(a2)  is  342.6,  occurring  at  d2  =  1.30. 
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To  illustrate  how  the  robust  loss  function  /  behaves  when  the  incorrect  model 
is  used,  we  repeat  the  above  process  with  2nd  and  3rd  order  difference  models  used 
for  constructing  the  predictor  y-_1.  The  values  of  /  corresponding  to  the  incorrect 
assumptions  of  2nd  and  3rd  order  difference  models  are  plotted  in  Figures  l.c  and  l.d 
respectively.  Not  only  are  the  minimum  values  of  /,  based  on  incorrect  assumptions 
of  2nd  and  3rd  order  difference  models  considerably  larger  than  that  when  the  correct 
model  is  used,  l  also  looses  its  smoothness  and  convexity  when  the  incorrect  model  is 
used.  Greater  roughness  occurs  when  the  model  mismatch  is  greater.  The  extensive 
degree  of  roughness  associated  with  model  mismatch  surprised  us  a  bit.  Perhaps  this 
behaviour  provides  a  tipoff  that  the  incorrect  model  is  being  used. 

To  demonstrate  how  well  the  fixed-lag  smoothers  work  using  the  estimated  value 
of  of,  we  display  the  data  in  Figure  2,  along  with  lag-k  smoothed  values  for  k  =  0,  2 
and  the  corresponding  state  estimation  residuals.  Figure  2.a  shows  the  observed  data 
and  the  lag-0  smoother  xj,  while  Figure  2.b  shows  the  corresponding  state  estimation 
residuals  xt  —  x\.  Note  that  under  the  assumption  that  of  =  0,  we  have  yt  =  xt 
and  x\  =  yt  much  of  the  time.  Correspondingly  the  residuals  xt  —  x\  are  zero  much 
of  the  time.  Results  for  the  lag-2  smoother  x{+2  are  shown  in  Figures  2.c  and  2.d. 

Figures  2.a  and  2.c  show  that  the  lag-0  and  lag-2  smoothers  follow  the  data  quite 
well  and  they  are  successful  in  removing  both  moderate  and  large  outliers.  Overall, 
the  state  estimation  residuals  for  the  lag-2  smoother  (Figure  2.d)  tend  to  be  smaller 
than  those  for  the  lag-0  smoother  (Figure  2.b),  as  one  would  expect. 

Example  2:  (erf  >  0) 

Again  200  observations  are  generated  by  the  same  model  as  in  the  first  example 
except  that  now  of  =  4  and  H  ~  Ar(0,50).  Thus  there  is  a  nominal  additive 
Gaussian  noise  variance  of  >  0  which  we  need  to  estimate.  So  now  /  =  /(of,  of) 
is  a  function  of  the  two  unknown  variances  erf,  of. 

The  data  is  displayed  in  Figure  3. a.  The  outliers  are  now  a  bit  less  obvious  because 
of  the  iV(0,4)  Gaussian  component  of  vt,  but  they  are  none  the  less  present.  The 
loss  function  /  was  evaluated  for  100  values  of  of  equally  spaced  from  .1  to  4  and  100 
values  of  of  equally  spaced  from  .1  to  S.  The  contours  of  this  function  are  plotted 
against  of  and  of  in  Figure  3.b.  The  plot  shows  that  the  function  is  quite  smooth  and 
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approximately  convex  so  that  the  minimum  is  quite  easily  located.  The  minimum  of 
684.4  is  obtained  at  a\  =  5.0  and  d2  =  1.3. 

In  Figures  3.c  and  3.d  respectively,  we  show  the  contours  of  the  function  l  based 
on  the  use  of  incorrect  2nd  and  3rd  order  difference  models  to  construct  y‘_1.  The 
plots  suggest  that  local  convexity  of  l  is  still  present,  and  the  roughness  of  /  associated 
with  a  mismatched  model  is  not  nearly  as  great  as  in  the  previous  example  where  we 
set  crl  =  0.  For  the  case  of  an  incorrectly  used  2nd  order  model,  the  minimum  of  / 
is  702.9,  occurring  at  d2  =  6.6,  d2  =  .035.  While  the  estimate  d2  is  increasing, 
d2  is  decreasing  and  quite  smaller  than  the  true  value  a\  —  1.  When  using  an 
incorrect  third  order  model,  the  estimate  d2  =  7.4  is  large  compared  with  the 
true  value  <r2  =  4;  however,  the  really  dramatic  effect  is  the  extremely  small  value 
d2  =  .0006.  These  results  indicate  that  when  a  mismatched  model  of  higher  order  is 
used,  the  observation  variance  will  be  overestimated  and  the  innovation  variance  will 
be  underestimated,  and  that  this  effect  will  be  greater  when  the  degree  of  mismatch 
is  greater. 

Note  that  the  larger  the  observation  variance  is,  the  “smoother”  the  conditional 
expectation  y\~l  would  be.  In  addition,  the  use  of  higher  order  difference  model 
intrinsically  leads  to  more  smoothness. 

In  Figure  4,  we  plot  the  observations  along  with  the  lag-k  smooths,  and  the  state 
estimation  residuals  for  k  =  0,  2.  The  results  are  as  one  might  expect.  The  lag-2 
smoother  (Figure  4.c)  shows  much  more  smoothness  than  the  lag-0  smoother  (Figure 
4.a).  The  state  estimation  residuals  from  the  lag-2  smoother  are  also  noticeably 
smaller  than  those  for  the  lag-0  smoother. 
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5  Application  to  Glint  Noise  Data 

5.1  Description  of  Data 

In  Figure  5,  we  display  a  segment  of  radar  glint  noise  data  encountered  in  radar 
tracking  of  aircraft  targets.  The  ordinate  values  are  the  apparent  line-of-sight  angles  of 
the  aircraft  (in  degrees),  measured  from  baresite  of  the  fixed  position  radar  dish.  The 
origin  of  the  ordinate  corresponds  to  a  right-angle  side  view  of  the  aircraft  situated 
in  a  level  position.  The  exceedingly  spiky  outlier  prone  behaviour  of  this  process  is 
due  to  interference  from  the  multiple  returns  of  the  radar  signal  reflecting  from  the 
various  components  of  the  aircraft  structure  (e.g.,  wing  leading  edges,  tails,  cockpit, 
etc.).  The  spikes  are  often  huge,  sometimes  giving  a  misindicator  of  true  angular 
position  by  as  much  as  100°  or  more.  In  spite  of  having  an  extremely  non-Gaussian 
character  (see  Hewer,  Martin  and  Zeh,  1987),  the  process  has  often  been  treated  as 
being  Gaussian  in  engineering  analyses.  Needless  to  say,  such  analyses  may  be  quite 
misleading. 

The  radar  glint  noise  also  contains  a  low-frequency  trend-like  component  which  one 
can  see  in  Figure  5  (eyeball  a  smooth  curve  through  the  data).  This  low-frequency 
component,  called  “bright  spot  wander”,  represents  the  electromagnetic  center  of 
gravity  of  the  aircraft,  which  obviously  changes  (slowly)  as  the  aircraft  rotates. 

5.2  Model  Estimation 

Prior  experiments  (Martin  and  Thompson,  1982)  with  robust  autoregressive  model 
fitting  and  robust  smoothing  indicate  that  the  trend-like  component  in  radar  glint 
noise  is  well  modeled  by  1*‘,  2nd  or  at  most  3rd  order  autoregressions  with  parameters 
on  the  boundary  of  nonstationarity.  Thus,  for  this  data  it  is  natural  to  use  the  simple 
approach  based  on  1*‘,  2nd  and  3rd  order  difference  models  and  robust  estimation  of 
the  parameters  a]  and  (7q.  We  examine  t%vo  cases,  one  with  the  assumption  =  0 
and  another  without  this  assumption.  Employing  the  estimation  procedure  described 
in  Section  4,  we  obtained  the  following  results. 

Assuming  that  <7^  =  0,  the  robust  loss  function  1  for  l,f,  2nd  and  3rd  order 
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difference  models,  are  evaluated  and  plotted  in  Figure  6,  (a)-(c)  for  lft  through  3rd 
order  difference  models,  respectively.  The  estimates  a\  and  the  values  of  the  loss 
function  l(b\ )  are  presented  in  Table  1. 

Table  1:  The  estimates  a\  and  /(of). 


Order 

%  itf) 

1“ 

2nd 

3rd 

177  3997.4 

205  4468.8 

389  5206.3 

On  the  basis  of  minimum  /,  one  is  clearly  led  to  prefer  the  first  order  model.  We 
note  also  that  whereas  the  function  /  =  /(of)  is  rather  smooth  and  nearly  convex 
for  the  1J<  order  model,  /  becomes  increasingly  rough  and  non-convex  for  the  2nd  and 
3rd  order  models.  This  is  further  evidence  that  the  2nd  and  3rd  order  models  are 
inappropriate. 

The  glint  noise  data  and  the  lag-3  smooths  corresponding  to  the  estimated  1*‘,  2nd 
and  3rd  order  difference  models  are  plotted  in  Figure  7  (a  rationale  for  the  choice  of 
lag-3  is  provided  in  the  next  section).  We  plot  only  the  first  300  data  points  so  that 
the  behaviour  of  the  smoothers  can  be  easily  examined.  Figure  7.a  shows  that  for 
the  first  order  model,  the  lag-3  smoother  is  quite  successful  in  chopping  off  the  large 
spikes/outliers,  and  otherwise  follows  the  data.  This  is  as  expected  because  of  the 
assumption  <Tq  =  0,  according  to  which  a  “data-cleaner”  behaviour  is  expected  (see, 
Martin  and  Thompson,  1982).  However,  the  smoother  perhaps  does  not  chop  off  the 
more  moderate  sized  spikes  as  effectively  as  one  might  wish. 

Figure  7.b  shows  that  for  the  second  order  model,  the  lag-3  smoother  does  not 
behave  quite  so  well.  It  overshoots  in  a  few  places,  and  is  consequently  not  as  effective 
in  chopping  off  the  spikes.  The  behaviour  of  the  lag-3  smoother  for  the  3rd  order 
model,  shown  in  Figure  7.c,  is  even  worse  in  this  regard.  The  behaviour  of  the 
lag-3  smoothers  for  the  2nd  and  3rd  order  difference  models  are  consistent  with  our 
observation  that  these  models  fit  poorly  using  the  minimized  /  as  a  criterion,  under 


the  assumption  that  <72  =  0. 

We  now  consider  the  case  where  a\  >  0  is  treated  as  an  unknown,  and  the  two 
parameters  and  <r2  are  estimated  by  minimizing  the  robust  loss  function  l.  The 
estimates  d2  and  d2  and  the  values  of  the  loss  function  l(d% ,  d2)  are  presented  in 
Table  2. 

Table  2:  The  estimates  d2,  a 2,  and  /(d2,d2). 


Order 

% 

faS.  *?) 

1“ 

57.2 

73.10 

3968.5 

2nd 

129.4 

1.62 

4073.3 

3rd 

154.1 

.04 

4176.6 

The  contours  of  /  are  plotted  in  Figures  8a,  b,  c  for  the  1*‘,  2nd  and  3rd  order 
difference  models  respectively.  As  in  the  artificial  Example  2  of  Section  4,  the  surface 
appears  most  smooth  and  nearly  convex  for  the  first  order  model,  with  some  degra¬ 
dation  in  this  respect  for  the  2nd  and  3rd  order  models.  On  the  basis  of  minimum  l, 
the  first  order  model  is  again  preferred.  Using  this  model,  the  robust  loss  function  at¬ 
tains  a  minimum  value  of  3968.5,  which  is  smaller  than  the  minimum  value  of  3997.4 
obtained  in  the  case  when  it  was  assumed  that  —  0.  Thus  from  the  robust  loss 
function  point  of  view  the  assumption  >  0  leads  to  a  better  model  fit. 

The  original  glint  noise  data  (first  300  points),  and  results  of  applying  the  lag-3 
smooths  are  displayed  in  Figures  9a-c  for  the  \,t-3rd  order  models,  respectively.  The 
lag-3  smooths  in  Figure  9a  for  the  first  order  difference  with  d2  =  57.2  exhibit 
more  smoothness  than  those  in  Figure  7a  for  a  first  model  difference  model  with 
the  assumption  <r2  =  0.  Note  however  that  for  the  2nd  and  3rd  order  models,  the 
combination  of  higher  order  and  larger  ratio  d2/d2  of  estimated  variances  results  in 
considerably  increased  smoothness. 

Actually,  if  one  takes  a  “smoothing  for  visual  display”  point  of  view,  the  second  or¬ 
der  model  lag-k  smoother  shown  in  Figure  9.b  provides  a  most  pleasing  presentation: 
It  fits  a  more  smooth  curve  through  the  data  than  in  the  case  of  the  first  order  differ- 
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ence  model,  and  the  use  of  a  third  order  difference  model  yields  little  improvement 
over  the  second  order  model. 

6  Choice  of  Lag 

In  the  previous  section  we  displayed  results  for  lag-3  smoothing  of  radar  glint  noise. 
The  choice  of  lag  3  for  this  particular  example  was  determined  empirically  by  ex¬ 
amining  the  quality  of  the  smoother  for  various  lags.  The  reason  for  this  is  that  an 
analytical  approach  for  choosing  k  is  not  feasible  for  robust  smoothers  because  of  their 
nonlinear  structures  (by  way  of  contrast,  in  the  case  of  the  linear  Kalman  smoother, 
the  lag  k  might  be  chosen  to  achieve  a  mean-squared  error  within  a  specified  factor 
of  the  fixed  interval  smoother  mean  squared  error,  the  latter  being  the  best  that  one 
can  do). 

Figures  10  and  11  display  robust  lag-k  smoothers  for  k  =  0,  1,  3,  5  for  the  first 
and  second  order  difference  models  respectively  where  both  <j\  and  <j\  are  estimated. 
For  the  first  order  model  there  is  not  a  great  deal  of  difference  in  the  quality  of  the 
smooth  for  various  lags,  there  being  only  a  slight  amount  of  increased  smoothness 
with  increasing  lag.  For  the  second  order  model  there  is  a  noticeable  increase  in 
smoothness  with  increasing  lag  k,  and  most  of  this  increased  smoothness  seems  to 
be  achieved  at  k  =  3.  Thus  we  displayed  the  k  =  3  results  for  both  first  and  second 
order  models  in  the  previous  section. 

To  study  the  behaviour  of  the  fixed-lag  smoothers  for  different  lags  in  a  more 
systematic  way,  we  carry  out  a  simulation  study  as  follows.  We  generated  50  time 
series  of  length  200  from  the  process 


Vt  =  xt  +  vt 

where  xt  is  a  first  order  autoregression  with  standard  normal  innovations  et,  and  vt  is 
a  contamination  process  containing  outliers. 

In  this  simulation  experiment  we  introduce  a  new  consideration,  namely  the  pos¬ 
sibility  of  patchy  outliers  of  various  lengths.  The  basic  question  to  be  addressed  in 
this  regard  is  whether  the  use  of  longer  lags  k  in  the  lag-k  robust  smoother  would  be 
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helpful  in  dealing  with  patchy  outliers.  Thus  we  construct  outlier  patches  of  lengths 
1,  2  and  3  by  the  following  way.  For  each  sample  path,  the  process  xt  are  contami¬ 
nated  by  non-overlapped  patches  of  outliers,  independently  generated  from  iV(0,25). 
The  positions  of  the  patches  are  randomly  chosen.  For  all  three  patch  lengths,  there 
are  20%  of  outliers  in  total. 

The  robust  lag-k  smoothers  performance  is  evaluated  by  computing  for  each  sam¬ 
ple  path  the  mean-squared  error  for  smoothing  at  outlier  positions  ( MSEi )  and 
non-outlier  positions  ( MSE2 )  separately: 


MSEi 

1 

-  xt)2 

,5^0  1 

mse2 

1 

E  (*:+‘ 

(:vt=0 

-  xt)2. 

X^t:t>e=0  1 

The  above  mean-squared  errors  are  computed  for  each  of  the  50  sample  paths. 

The  boxplots  of  both  MSEs  for  lag  k,  k  =  0,  1,...,5  are  displayed  in  Figure  12  for 
patches  of  length  1,  2  and  3.  The  results  for  MSEi  are  shown  in  Figure  12a,  b,  c, 
while  the  MSE2  results  are  shown  in  Figure  12e,  f,  g. 

These  plots  show  that: 

(a)  Both  MS E\  and  MSE2  increase  with  increasing  patch  length. 

(b)  MSE\  has  a  nearly  symmetric,  distribution  (with  an  upper  tail  slightly  heavier 

than  the  lower  tail)  for  patches  of  length  one,  and  an  increasingly  asymmetric 
distribution  skewed  to  the  right  for  increasing  patch  length. 

(c)  MSE2  has  an  asymmetric  distribution,  with  an  upper  tail  considerably  heavier 

than  the  lower  tail,  for  all  3  patch  lengths. 

(d)  For  all  these  patch  lengths  both  MSEi  and  MSE2  essentially  reach  their  opti¬ 

mum,  namely  a  distribution  must  compact  toward  zero,  by  lag  k  =  3. 

(e)  MSE2  does  not  settle  to  zero  with  increasing  lag. 

The  behaviour  (e)  has  to  do  with  the  fact  that  the  presence  of  outliers  has  an 
effect  on  the  smoother  at  non-outliers  positions.  While  this  behaviour  is  not  entirely 
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unavoidable,  one  wonders  whether  robust  lag-k  smoothers  can  be  constructed  with 
smaller  mean-squared  error  at  non-outlier  positions. 

The  result  (d)  is  consistent  with  our  choice  of  k  =  3  for  the  lJt  order  difference 
model  lag-k  smoothing  of  radar  glint  noise  in  Section  5.  To  make  sure  that  k  =  3 
is  reasonable  for  the  2nd  order  difference  model  we  should  probably  run  the  above 
experiment  based  on  such  a  model.  More  generally,  this  type  of  simulation  would 
appear  to  provide  useful  guidelines  for  selecting  k  for  robust  lag-k  smoothing  for  a 
wide  range  of  state  space  models. 

7  Sensitivity  To  Model  Mismatch 

We  have  proposed  a  simple  approach  to  robust  smoothing  which  involves  estimation  of 
only  two  parameters  by  virtue  of  assuming  that  the  core  part  of  the  data  is  well  mod¬ 
eled  by  low  order  difference  models,  thereby  avoiding  the  estimation  of  autoregression 
(AR)  or  autoregression-moving  average  (ARMA)  parameters.  One  wonders  how  the 
method  will  work  when  the  core  part  of  the  data  in  fact  comes  from  a  stationary  AR 
or  ARMA  model,  with  parameters  “reasonably  near”  the  boundary  of  nonstationary. 
To  get  a  feeling  for  how  our  approach  would  work  under  such  “mismatch”  conditions 
with  regard  to  the  estimation  of  a2  and  a2,  we  carry  out  a  simulation  study  for  the 
AR(1)  case. 

Specifically  we  generate  xt  as  a  first-order  Gaussian  autoregression  with  (f>  =  .5, 
.9,  .95,  .98,  1  and  standard  normal  innovations  et,  and  use  the  estimation  approach  of 
Section  4  assuming  a  first  difference  model.  The  xt  are  contaminated  with  additive 
outliers  vt  which  has  the  distribution  (1)  with  nominal  variance  <Jq  =  0  (and  so  we 
do  not  estimate  <72)  and  H  ~  A^O,  25).  The  contamination  fraction  7  is  chosen  to 
be  0,  .1,  .2  and  .3.  For  each  combination  of  7  and  <£,  100  time  series  of  length  200 
are  generated.  For  each  series,  the  robust  loss  function  /(<72)  is  evaluated,  and  the 
estimate  <72  which  minimizes  the  loss  function  is  obtained.  The  means  and  standard 
deviations  of  the  estimates  a\  are  presented  in  Table  3. 

The  results  show  that  the  means  (and  of  course  the  standard  deviations  as  well) 
increase  with  an  increasing  fraction  contamination  7.  The  means  and  the  standard 
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Table  3:  Estimates  of  Ecr]  obtained  via  Monte  Carlo  (sample  standard  deviations  of 
estimates  in  paratheses) 


True  0 

7 

=  0 

7  =  .1 

7  =  -2 

7  = 

=  .3 

0.50 

.91 

(.16) 

1.51  (.36) 

2.24  (.48) 

3.41 

(.96) 

0.90 

.92 

(.13) 

1.43  (.27) 

2.20  (.47) 

3.45 

(.82) 

0.95 

.94 

(.12) 

1.47  (.27) 

2.28  (.53) 

3.40 

(.82) 

0.98 

.96 

(.14) 

1.47  (.25) 

2.27  (.57) 

3.39 

(.88) 

1.00 

.93 

(.13) 

1.46  (.29) 

2.22  (.50) 

3.33 

(.77) 

deviations  are  relatively  constant  with  respect  to  changing  0.  This  suggests  that 
our  method  is  quite  insensitive  to  model  mismatch  as  far  as  estimation  of  of  goes, 
when  it  is  known  that  a f  =  0.  Except  for  the  case  0%  contamination  in  which  the 
vstimated  values  are  biased  downward,  the  estimated  values  of  for  other  cases  are 
biased  upward.  Although  the  biases  in  the  estimates  of  of  appear  to  be  large  when 
7  >  0  ,  the  biases  of  o,  as  an  estimate  of  the  standard  deviation  oe  are  not  nearly  so 
large. 

It  is  customary  in  robust  estimation  that  biased  estimates  are  standardized  so  as 
to  be  unbiased  at  the  nominal  model  (i.e.,  0  =  1  and  7  =  0  in  this  case).  This 
standardization  is  accomplished  in  the  present  case  by  dividing  the  estimate  of  by  its 
asymptotic  value  J3'00of .  To  do  this  one  needs  to  know  the  asymptotic  value  i^of  of 
the  estimate  at  the  nominal  model.  Sometimes,  it  is  possible  to  calculate  such  values 
analytically  or  using  numerical  integration  (see,  e.g.,  Hampel  et  al,  1986).  This  is  not 
possible  in  the  present  case.  However,  asymptotically  a  sufficiently  accurate  estimate 
could  be  obtained  by  a  very  long  simulation  study  in  this  case. 

One  should  also  study  the  effect  of  model  mismatch  on  (i)  joint  estimation  of  of 
and  of,  and  (ii)  lag-k  smoothing  mean-squared  error.  We  expect  to  do  this  in  the 
near  future. 


8  Concluding  Remarks 


For  on-line  applications  of  our  method,  the  crude  grid-search  approach  to  optimization 
needs  to  be  replaced  by  an  on-line  gradient  or  Newton  type  approach.  This  may  be 
accomplished  via  a  stochastic  approximation  type  algorithm  (see  for  example,  Albert 
and  Gardner,  1967). 

In  view  of  the  perhaps  uncomfortably  large  biases  for  0}  exhibited  in  Table  3  for 
7  >  0,  we  wonder  whether  a  more  robust  estimate  of  0]  can  be  obtained  by  some 
modification  of  the  robust  loss  function  approach  described  in  Section  4. 

References 

[1]  Albert,  A.E.  and  Gardner,  L.A.  (1967)  Stochastic  Approximation  and  Nonlinear 
Regression.  The  M.I.T.  Press,  Cambridge,  Massachusetts. 

[2]  Anderson,  B.D.O.  and  Moore,  J.B.  (1979)  Optimal  Filtering.  Prentice-Hall,  En¬ 
glewood  Cliffs,  NJ. 

[3]  Fraiman,  R.  and  Martin,  R.D.  (1988)  “Approximate  conditional  mean  filter  for 
non-Gaussian  noise”,  Technical  Report,  University  of  Washington. 

[4]  Hampel,  F.R.,  Ronchetti,  E.M.,  Rousseeuw,  P.J.  and  Stahel,  W.A.  (1986)  Robust 
Statistics:  The  Approach  Based  on  Influence  Functions.  Wiley. 

[5]  Harvey,  A.C.  (1981)  Time  Series  Models.  Halsted  Press,  New  York. 

[6]  Hewer,  G.A.,  Martin,  R.D.  and  Zeh,  J.E.  (1987)  “Robust  preprocessing  of  Kalman 
filtering  of  glint  noise”,  IEEE. -Trans,  on  Aerospace  and  Electronic  System,  AES- 
23,  No.  1,  pp.  120-128. 

[7]  Huber,  P.J.  (1964)  “Robust  estimation  of  a  location  parameter”,  Annals  of  Math¬ 
ematical  Statistics.  35,  pp.  73-101. 

[8]  Huber,  P.J.  (1981)  Robust  Statistics.  Wiley,  New  York,  NY. 


[9]  Jazwinsld,  A.H.  (1970)  Stochastic  Processes  and  Filtering  Theory.  New  York:  Aca¬ 
demic  Press. 

[10]  Martin,  R.D.  (1979)  “Approximate  conditional-mean  type  smoothers  and  inter¬ 
polators”,  in  Smoothing  Techniques  for  Curve  Estimation,  edited  by  T.  Gasser 
and  M.  Rosenblatt.  Springer,  Berlin,  pp.  117-143. 

[11]  Martin,  R.D.  (1981)  “Robust  smoothers  and  outliers  interpolators”.  Unpublished 
manuscript. 

[12]  Martin,  R.D.  and  Su,  K.Y.  (1985)  “Robust  Filters  and  Smoothers:  Definition 
and  Design”,  Technical  Report  No.  58,  University  of  Washington. 

[13]  Martin,  R.D.  and  Thomson,  D.J.  (1982)  “Robust- Resistant  Spectrum  Estima¬ 
tion”,  Proceedings  of  the  IEEE,  Vol.  70,  No.  9,  pp.  1097-1115. 

[14]  Masreliez,  C.J.  (1975)  “Approximate  non-Gaussian  Filtering  with  Linear  State 
and  Observation  Relation”,  IEEE-Trans.  Automat.  Contr.,  Vol.  AC-20,  pp.  361- 
371. 

[15]  Meditch,  J.S.  (1969)  Stochastic  Optimal  Linear  Estimation  and  Control  New 
York:  McGraw-Hill. 

[16]  Moore,  J.B.  (1973)  “Discrete-Time  Fixed-Lag  Smoothing  Algorithms”,  Auto- 
matica,  Vol.  9,  No.  2,  pp.  163-173. 

[17]  Stuck,  B.W.  (1976)  “Minimum  Error  Dispersion  Linear  Filtering  of  Symmetric 
Stable  Processes”,  IEEE-Trans.  Automat.  Contr.,  Vol.  17,  pp.  507-509. 

[18]  Wecker,  W.E.  and  Ansley,  C.F.  (1983)  “The  Signal  Extraction  Approach  to 
Nonlinear  Regression  and  Spline  Smoothing”,  JASA,  78,  pp.  81-89. 


Robust  Loss  Function  Data 


FIGURE  1 :  DATA  AND  ROBUST  LOSS  FUNCTION 
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Figure  2:  DATA  (— )  AND  FIXED-LAG  SMOOTHERS  (— ) 
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FIGURE  3:  DATA  AND  CONTOURS  OF  THE  ROBUST  LOSS  FUNCTIONS 
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Figure  4:  DATA  (— )  AND  FIXED-LAG  SMOOTHERS 
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Figure  5:  RADAR  GLINT  NOISE 
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FIGURE  6:  THE  ROBUST  LOSS  FUNCTION  FOR  RADAR  GLINT  NOISE 
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FIGURE  8:  CONTOURS  OF  THE  ROBUST  LOSS  FUNCTION  FOR  RADAR  GLINT  NOISE 


o 


20  40  60  80  100  120 


Innovation  variance 

(a)  FIRST  ORDER  DIFFERENCE  MODEL 


Innovation  variance 

(b)  SECOND  ORDER  DIFFERENCE  MODEL 


o 


00  0.1  0.2  0.3  0.4  0.5 


Innovation  variance 
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(d)  PATCH  =.  t  AT  NON-OUTLIERS.  (•)  PATCH  =  2  AT  NON-OUTLIERS.  (f)  PATCH  =  3  AT  NON-OUTLIERS. 


